Learning Policies for Data Imputation with Guided Policy Search
نویسندگان
چکیده
We explore the relationship between directed generative models and reinforcement learning by developing a new approach to data imputation that combines ideas from both areas. We address data imputation by defining an MDP for which we construct policies parametrized by (reasonably) large neural networks. We then show how to train these policies using a form of (self) Guided Policy Search (Levine & Koltun, 2013a), which leads to maximizing a variational bound on the quality of the imputations made by our policies. Empirically, our policies perform well over a range of conditions.
منابع مشابه
Data Generation as Sequential Decision Making
We connect a broad class of generative models through their shared reliance on sequential decision making. Motivated by this view, we develop extensions to an existing model, and then explore the idea further in the context of data imputation – perhaps the simplest setting in which to investigate the relation between unconditional and conditional generative modelling. We formulate data imputati...
متن کاملGuided Policy Search as Approximate Mirror Descent
Guided policy search algorithms can be used to optimize complex nonlinear policies, such as deep neural networks, without directly computing policy gradients in the high-dimensional parameter space. Instead, these methods use supervised learning to train the policy to mimic a “teacher” algorithm, such as a trajectory optimizer or a trajectory-centric reinforcement learning method. Guided policy...
متن کاملGuided Policy Search via Approximate Mirror Descent
Guided policy search algorithms can be used to optimize complex nonlinear policies, such as deep neural networks, without directly computing policy gradients in the high-dimensional parameter space. Instead, these methods use supervised learning to train the policy to mimic a “teacher” algorithm, such as a trajectory optimizer or a trajectory-centric reinforcement learning method. Guided policy...
متن کاملGuided Policy Search
Direct policy search can effectively scale to high-dimensional systems, but complex policies with hundreds of parameters often present a challenge for such methods, requiring numerous samples and often falling into poor local optima. We present a guided policy search algorithm that uses trajectory optimization to direct policy learning and avoid poor local optima. We show how differential dynam...
متن کاملEnd-to-End Training of Deep Visuomotor Policies
Policy search methods can allow robots to learn control policies for a wide range of tasks, but practical applications of policy search often require hand-engineered components for perception, state estimation, and low-level control. In this paper, we aim to answer the following question: does training the perception and control systems jointly end-toend provide better performance than training...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015